Reducing False Sharing on Shared Memory Multiprocessors
نویسنده
چکیده
We have developed compiler algorithms that analyze coarse-grained, explicitly parallel programs and restructure their shared data to minimize the number of false sharing misses. The algorithms analyze the per-process data accesses to shared data, use this information to pinpoint the data structures that are prone to false sharing and choose an appropriate transformation to reduce it. The algorithms eliminated an average (across the entire workload) of 64% of false sharing misses, and in two programs more than 90%. However, how well the reduction in false sharing misses translated into improved execution time depended heavily on the memory subsystem architecture and previous programmer e orts to optimize for locality. On a multiprocessor with a large cache con guration and high cache miss penalty, the transformations improved the execution time of programmer-unoptimized applications by as much as 60%. However, on programs where previous programmer e orts to improve data locality had reduced the original amount of false sharing, and on a multiprocessor with a small cache con guration and cache miss penalty, the gains were more modest.
منابع مشابه
Computation and Data Partitioning on Scalable Shared Memory Multiprocessors
In this paper we identify the factors that affect the derivation of computation and data partitions on scalable shared memory multiprocessors (SSMMs). We show that these factors necessitate an SSMM-conscious approach. In addition to remote memory access, which is the sole factor on distributed memory multiprocessors, cache affinity, memory contention and false sharing are important factors that...
متن کاملExperiences with Data Distribution on NUMA Shared Memory Multiprocessors
The choice of a good data distribution scheme is critical to performance of data-parallel applications on both distributed memory multiprocessors and NUMA shared memory multiprocessors. The high cost of interprocessor communication in distributed memory multiprocessors makes the minimization of communications the predominant issue in selecting data distributionschemes. However, on NUMA multipro...
متن کاملSoftware Caching on Cache-Coherent Multiprocessors
Programmers have always been concerned with data distribution and remote memory access costs on shared-memory multiprocessors that lack coherent caches, like the BBN Butterry. Recently memory latency has become an important issue on cache-coherent multiprocessors, where dramatic improvements in microprocessor performance have increased the relative cost of cache misses and coherency transaction...
متن کاملFalse Sharing Elimination by Selection of Runtime Scheduling Parameters
False sharing can be a source of signiicant overhead on shared-memory multiprocessors. Several program restructuring techniques to reduce false sharing have been proposed in past work. In this paper, we propose an approach for elimination of false sharing based solely on selection of runtime schedule parameters for parallel loops. This approach leads to more portable code since only the schedul...
متن کاملReducing False Sharing and Improving Spatial Locality in a Unified Compilation Framework
The performance of applications on large shared-memory multiprocessors with coherent caches depends on the interaction between the granularity of data sharing, the size of the coherence unit, and the spatial locality exhibited by the applications, in addition to the amount of parallelism in the applications. Large coherence units are helpful in exploiting spatial locality, but worsen the effect...
متن کامل